Data-driven Calibration of Penalties for Least-Squares Regression
نویسندگان
چکیده
Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from the data. We propose a completely data-driven calibration algorithm for this parameter in the least squares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birgé and Massart (2007) in the context of penalized least squares for Gaussian homoscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a datadriven estimation of an optimal penalty which can be used in practice; on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework. The purpose of this paper is twofold: stating a more general heuristics for designing a data-driven penalty (the slope heuristics) and proving that it works for penalized least squares random design regression, even for heteroscedastic non-Gaussian data. For some technical reasons, some exact mathematical results will be proved only for regressogram bin-width selection. This is at least a first step towards further results, since the approach and the method that we use are indeed general.
منابع مشابه
Spectrophotometric Simultaneous Kinetic Determination of Iodide and Iodate Using Partial Least-Squares Calibration Method in a Single Kinetic Run
A rapid, sensitive and versatile kinetic method is presented for the simultaneous spectrophotometric determination of iodide and iodate by partial least-squares regression (PLS) using original and derivate data named as absorbance and rate data. The method is based on the catalytic effect of the cited anions on the reaction rate between Ce(IV) and As(III) in 2 mol l?1 sulfuric acid medium. The ...
متن کاملSuboptimality of penalties proportional to the dimension for model selection in heteroscedastic regression
We consider the problem of choosing between several models in least-squares regression with heteroscedastic data. We prove that any penalization procedure is suboptimal when the penalty is proportional to the dimension of the model, at least for some typical heteroscedastic model selection problems. In particular, Mallows’ Cp is suboptimal in this framework, as well as any “linear” penalty depe...
متن کاملDetermination of Protein and Moisture in Fishmeal by Near-Infrared Reflectance Spectroscopy and Multivariate Regression Based on Partial Least Squares
The potential of Near Infrared Reflectance Spectroscopy (NIRS) as a fast method to predict the Crude Protein (CP) and Moisture (M) content in fishmeal by scanning spectra between 1000 and 2500 nm using multivariate regression technique based on Partial Least Squares (PLS) was evaluated. The coefficient of determination in calibration (R2C) and Standard Error of Calibra...
متن کاملSimultaneous spectrophotometric determination of lead, copper and nickel using xylenol orange by partial least squares
A partial least squares (PLS) calibration model was developed for the simultaneous spectrophotometricdetermination of Pb (ΙΙ), Cu (ΙΙ) and Ni (ΙΙ) using xylenol orange as a chromogenic reagent. The parameterscontrolling behavior of the system were investigated and optimum conditions were selected. The calibrationgraphs were linear in the ranges of 0.0–9.091, 0.0–2.719 and 0.0–2.381 ppm for lead...
متن کاملCombined Sum of Squares Penalties for Molecular Divergence Time Estimation
Estimates of molecular divergence times when rates of evolution vary require the assumption of a model of rate change. Brownian motion is one such model, and since rates cannot become negative, a log Brownian model seems appropriate. Divergence time estimates can then be made using weighted least squares penalties. As sequences become long, this approach effectively becomes equivalent to penali...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 10 شماره
صفحات -
تاریخ انتشار 2009